| Week | Topic |
|---|---|
| 1 |
|
| 2 |
|
| 3 |
|
| 4 | Plotting using R, basics
|
| 5 | Plotting with R, tuning plots
|
| 6 | Plotting Phylogenetic trees with R
|
| 7 | Plotting Phylogenetic trees with R
|
| 8 | Data analysis - Continuous data |
| 9 | Data analysis - Linear regression |
| 10 | Data analysis - Categorical data |
| 11 | Data analysis - Logistic regression |
| 12 | Data analysis - Time-to-event data and survival |
Load required libraries
library(tidyverse)
library(table1)
library(knitr)
library(arsenal)
library(patchwork)
library(GGally)Load data from a csv file.
dfall <- read_csv('data/data.csv')
dfall <- dfall %>%
mutate_at(c("TCS_PR","TCS_RT","TCS_IN", "TCS_V1V3", "PI_RT","PI_V1V3","DIST20_RT", "DIST20_V1V3"), funs(as.numeric)) %>%
mutate(
racecat = factor(racecat, levels = c("White", "Black", "Hispanic", "Other/Unkn")),
risk2 = factor(risk2, levels = c("MSM", "HET-F", "HET-M", 'PWID-F', 'PWID-M', 'OTHER/UNKN'))
)
labels(dfall) <- c(ngscollectyr = 'Year of Diagnosis',
gender = 'Gender',
racecat = 'Race',
age_cat30 = 'Age ≤ 30y/o',
risk2 = 'Risk factor',
recent_cat = 'Recency Category',
owning_jd_region_fsu = 'Region in NC',
incluster = 'In Clusters',
cd4_value = 'CD4 count (cells/µL)',
vl_log_value = 'Viral Load (Log10 copies/mL)'
)Summarize the data, perform statistically comparison and generate publication-ready table.
Table 1. Characteristics of sequenced participants with new diagnoses in NC from 2018-2021.
dfall %>%
tableby(ngscollectyr ~ recent_cat + gender + racecat + age_cat30 + risk2 + owning_jd_region_fsu + incluster + cd4_value + vl_log_value,
data = .,cat.simplify=F, numeric.stats= c("median","q1q3"), test=T) %>%
summary(.,digits=1, digits.count=0, digits.pct=1, digits.p=2, title=NULL)| 2018 (N=270) | 2019 (N=237) | 2020 (N=112) | 2021 (N=195) | Total (N=814) | p value | |
|---|---|---|---|---|---|---|
| Recency Category | 0.02 | |||||
| Â Â Â Chronic | 131 (48.5%) | 109 (46.0%) | 43 (38.4%) | 97 (49.7%) | 380 (46.7%) | |
| Â Â Â Indeterminant | 44 (16.3%) | 21 (8.9%) | 16 (14.3%) | 31 (15.9%) | 112 (13.8%) | |
| Â Â Â Recent | 95 (35.2%) | 107 (45.1%) | 53 (47.3%) | 67 (34.4%) | 322 (39.6%) | |
| Gender | 0.08 | |||||
| Â Â Â Female | 44 (16.3%) | 29 (12.2%) | 7 (6.2%) | 29 (14.9%) | 109 (13.4%) | |
| Â Â Â Male | 222 (82.2%) | 202 (85.2%) | 100 (89.3%) | 163 (83.6%) | 687 (84.4%) | |
| Â Â Â Transgender Female | 4 (1.5%) | 6 (2.5%) | 4 (3.6%) | 3 (1.5%) | 17 (2.1%) | |
| Â Â Â Transgender Male | 0 (0.0%) | 0 (0.0%) | 1 (0.9%) | 0 (0.0%) | 1 (0.1%) | |
| Race | 0.04 | |||||
| Â Â Â White | 46 (17.0%) | 38 (16.0%) | 26 (23.2%) | 29 (14.9%) | 139 (17.1%) | |
| Â Â Â Black | 191 (70.7%) | 147 (62.0%) | 70 (62.5%) | 126 (64.6%) | 534 (65.6%) | |
| Â Â Â Hispanic | 26 (9.6%) | 30 (12.7%) | 10 (8.9%) | 28 (14.4%) | 94 (11.5%) | |
| Â Â Â Other/Unkn | 7 (2.6%) | 22 (9.3%) | 6 (5.4%) | 12 (6.2%) | 47 (5.8%) | |
| Age ≤ 30y/o | < 0.01 | |||||
| Â Â Â No | 109 (40.4%) | 76 (32.1%) | 33 (29.5%) | 90 (46.2%) | 308 (37.8%) | |
| Â Â Â Yes | 161 (59.6%) | 161 (67.9%) | 79 (70.5%) | 105 (53.8%) | 506 (62.2%) | |
| Risk factor | < 0.01 | |||||
| Â Â Â MSM | 183 (67.8%) | 169 (71.3%) | 82 (73.2%) | 127 (65.1%) | 561 (68.9%) | |
| Â Â Â HET-F | 39 (14.4%) | 25 (10.5%) | 5 (4.5%) | 6 (3.1%) | 75 (9.2%) | |
| Â Â Â HET-M | 38 (14.1%) | 28 (11.8%) | 6 (5.4%) | 9 (4.6%) | 81 (10.0%) | |
| Â Â Â PWID-F | 5 (1.9%) | 0 (0.0%) | 0 (0.0%) | 1 (0.5%) | 6 (0.7%) | |
| Â Â Â PWID-M | 5 (1.9%) | 9 (3.8%) | 8 (7.1%) | 5 (2.6%) | 27 (3.3%) | |
| Â Â Â OTHER/UNKN | 0 (0.0%) | 6 (2.5%) | 11 (9.8%) | 47 (24.1%) | 64 (7.9%) | |
| Region in NC | < 0.01 | |||||
| Â Â Â Asheville | 21 (7.8%) | 22 (9.3%) | 8 (7.1%) | 12 (6.2%) | 63 (7.7%) | |
| Â Â Â Charlotte | 48 (17.8%) | 12 (5.1%) | 5 (4.5%) | 11 (5.6%) | 76 (9.3%) | |
| Â Â Â Fayetteville | 26 (9.6%) | 32 (13.5%) | 17 (15.2%) | 19 (9.7%) | 94 (11.5%) | |
| Â Â Â Greensboro | 66 (24.4%) | 70 (29.5%) | 30 (26.8%) | 46 (23.6%) | 212 (26.0%) | |
| Â Â Â Raleigh | 54 (20.0%) | 57 (24.1%) | 19 (17.0%) | 52 (26.7%) | 182 (22.4%) | |
| Â Â Â Wilmington | 9 (3.3%) | 8 (3.4%) | 12 (10.7%) | 16 (8.2%) | 45 (5.5%) | |
| Â Â Â Winterville | 46 (17.0%) | 36 (15.2%) | 21 (18.8%) | 39 (20.0%) | 142 (17.4%) | |
| In Clusters | 0.08 | |||||
| Â Â Â No | 89 (33.0%) | 73 (30.8%) | 26 (23.2%) | 73 (37.4%) | 261 (32.1%) | |
| Â Â Â Yes | 181 (67.0%) | 164 (69.2%) | 86 (76.8%) | 122 (62.6%) | 553 (67.9%) | |
| CD4 count (cells/µL) | 0.55 | |||||
| Â Â Â Median | 407.0 | 432.0 | 428.0 | 394.0 | 419.0 | |
| Â Â Â Q1, Q3 | 275.0, 567.2 | 299.0, 604.5 | 281.0, 571.0 | 280.0, 532.0 | 288.0, 580.0 | |
| Viral Load (Log10 copies/mL) | < 0.01 | |||||
| Â Â Â Median | 4.7 | 4.5 | 5.1 | 4.7 | 4.7 | |
| Â Â Â Q1, Q3 | 4.1, 5.2 | 4.0, 5.1 | 4.7, 5.5 | 4.1, 5.2 | 4.1, 5.2 |
df_tcs <- dfall %>% select(c(
"recent_cat",
"TCS_RT",
"TCS_PR",
"TCS_IN",
"TCS_V1V3"
)
)
tcs_chart <- function(cat, title) {
df_tcs %>%
ggplot(aes(x = recent_cat, y = cat)) +
geom_violin() +
geom_jitter(aes(colour = recent_cat), size = 1, alpha = 0.5) +
scale_y_continuous(name = title, trans = 'log10') +
labs(x = "Recency Category", color = "Recency Category") +
theme_bw() +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank())
}
p1 <- tcs_chart(df_tcs$TCS_PR, "TCS# PR")
p2 <- tcs_chart(df_tcs$TCS_IN, "TCS# IN")
p3 <- tcs_chart(df_tcs$TCS_RT, "TCS# RT")
p4 <- tcs_chart(df_tcs$TCS_V1V3, "TCS# V1V3")
(p1 | p2) /
(p3 | p4)Example Tree 1
Example Tree 2